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ABSTRACT 

In this paper, we investigate the linearity versus non-linearity of the Large Magellanic Cloud (LMC) 
Cepheid period-luminosity (P-L) relation using two statistical approaches not previously applied to 
this problem: the testimator method and the Schwarz Information Criterion (SIC). The testimator 
method is extended to multiple stages for the first time, shown to be unbiased and the variance of the 
estimated slope can be proved to be smaller than the standard slope estimated from linear regression 
theory. The Schwarz Information Criterion (also known as the Bayesian Information Criterion) is 
more conservative than the Akaike Information Criterion and tends to choose lower order models. By 
using simulated data sets, we verify that these statistical techniques can be used to detect intrinsically 
linear and/or non- linear P-L relations. These methods are then applied to independent LMC Cepheid 
data sets from the OGLE project and the MACHO project, respectively. Our results imply that there 
is a change of slope in longer period ranges for all of the data sets. This strongly supports previous 
results, obtained from independent statistical tests, that the observed LMC P-L relation is non-linear 
with a break period at/around 10 days. 

Subject headings: Cepheids — distance scale - Stars: fundamental parameters - methods: statistical 
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1. INTRODUCTION 

The cornerstone of the extra-galactic distance scale 
is the Cepheid Period-Luminosity (P-L) relation de- 
fined by the Large Magellanic Cloud (LMC) Cepheids. 
The assumed linear relation of the LMC Cepheid P- 
L relation, which is linear in log(P), with P the 
pulsation period in days, has been under debate 
due to re cent results that this relation could be 
non- linear (Tammann & Reindl 2002; Kanbur & Ngeow 
120041 ISandage et alJ 120041: INgeow etai] I2005D . These 
authors contended that the existing Cepheid data 
in the LMC strongly suggested the LMC P-L rela- 
tion is consistent with two lines of significantly dif- 
fering slopes with a break at/around a period of 
10 days. This is referred as the non-linearity of 
the Cepheid P-L relation in this paper. Arguments 
for cho osing the fiducial perio d at 10 days can b e 
found inlKanbur fc Ngeowl (120041) .ISandage et all (12004D . 
INgeow et alJ (120051) and INgeow fe Kanburl (l2006al). Fur- 
thermore . IKanbur fc Ngeowjj[2004l . l20q6ft. ISandage et all 
20041) INgeow et all ((2005J) and INgeow fc Kanburl 
2006c) examined various factors that may cause the non- 



linearity of the LMC P-L relation, including the observ- 
ing strategies, photometric errors, extinction errors, re- 
moval of outliers, sample selection, number of long period 
Cepheids in the samples and contamination of overtone 
Cepheids. They found that none of these remedies or 
any combination of them could be responsible for the 
observed non-line a r LMC P-L relation. As argued in 
INgeow fc Kanburl (|2006d ). rigorous statistical tests are 



needed to test the linearity versus the non-linearity of 
the LMC P-L relation. 

In our previous studies, the F-test (e.g. lWeisberglll980ft 
has been applied to the OGLE (Optical Gravitational 
Lensing Experiment) and MACHO ( MAssive Compact 
Halo O bjec ts project) Ce p heid d ata, in lKanbur fc Ngeowl 
(|2004f) and INgeow et ail (|2005f ) respectively, to test for 
the non-linearity of the LMC P-L relation. In such a for- 
mulation, the full and reduced models are models with 
four and two parameters respectively. This test looks at 
the change in the mean residual sum of squares between 
the full and reduced model divided by the mean resid- 
ual sum of squares in t he full model (see equation [5] of 
IKanbur fc Ngeowj[2 004) . This test statistic can be formu- 
lated as the difference in slopes between short and long 
period slopes divided by the standard error of that differ- 
ence. Hence if the number and nature of the long/short 
period data are such that the long/short period slope 
is estimated with a large error, then the F- value will 
be low and return a non-significant result. Thus the F- 
test is sensitive to the number of data points on either 
side of the period cut at 1 days. The OGLE and MA- 
CHO data sets we used in lKanbur fc Ngeowl (|2004f ) and 
INgeow et alJ ((2005) , respectively, do have adequate num- 
ber of long and short period Cepheids for the application 
of the F-test. The F-test has returned a significant re- 
sult when testing the non-linearity of the P-L relation in 
both of the data sets. 

Nevertheless, the results that suggesting a non-linear 
LMC P-L relation are still controversial. As we empha- 
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size that statistical tests are needed, however claims of 
linear LMC P-L relation in the literature lack of rigor- 
ous statistical tests. In this paper, we apply two addi- 
tional statistical tests, the testimator method and the 
Schwarz Information Criterion method, to examine the 
non-linearity of the LMC Cepheid P-L relation. These 
tests will be complementary to the F-test carried out in 
previous studies since they will serve to check and verify 
the results obtained from the F-test. In this way previ- 
ous conclusions about the non-linear LMC P-L relation 
are considerably strengthened. Furthermore, both testi- 
mator and Schwarz Information Criterion methods are 
also able to estimate the break period without any a pri- 
ori assumption: recall that in previous work, the break 
period at 10 days is usually adopted. These two methods 
not only can be applied for Cepheid studies, as we did 
in this paper, but also to other astronomical and astro- 
physical hypothesis testing problems. We also emphasize 
that our use of the testimator has, for the first time, been 
generalized to more than two stages and hence is also a 
statistical result in its own right. In the next section, we 
outline these techniques in detail and their application 
to our problems. In Section 3 we apply these methods 
to LMC Cepheid data and present our results. The con- 
clusions and discussion are given in the last section. 

2. THE STATISTICAL METHODS 

2.1. The Testimator 

The conce pt of a tes t imato r (or test estimator) was first 
proposed by I Bancroft] (|1944t) in the context of estimating 
a parameter where a prior guess will be used in place of 
the estimator of an unknown parameter. The testimator 
can be applied if the prior guess for the unknown param- 
eter can be ascertained by a test of hypothesis, otherwise 
the traditional estimator will be used. Due to its superior 
efficiency compared to traditional estimators, the testi- 
mator metho d has been a d apted and refined to suit other 
situat io ns bvlPaull (Il950l). iHuntsbergerl (l!955f). iBancroftl 
19641) . lAmold fc Kattil (|1972j ). iBock et all (Il973f). iHanl 
iGhosh fe Sinhal (11988I). [Yancey et alj ijl989r) 

1995TT IPandevI 
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iPandev fc Malild (I1990TL. iPandey et all 
(119971) and iPandev fe Sriyastaval 1)20011 ) to name a few. 



IWaikar et al.l (|1984D and IWaikar etalT (|2002f ). in work 
on two-stage shrinkage estimation, proposed a weighted 
testimator by placing a weight 1 — fc on the prior guess 
and weight fc on the traditional estimator, where fc is the 
probability that the guess will be true. They showed that 
the testimators have far superior efficiency and there- 
fore are more powerful in estimating unknown parame- 
ters. This weighted two stage testimation concept can 
be extended to cover multiple stages. In this paper we 
apply this "weighted" testimator to investigate the non- 
linearity of the LMC Cepheid P-L relation as mentioned 
in the Introduction. 

The description of the two-stage testimator method is 
summarized as follows. For a linear regression of the 
form of y — (3x + a, the usual least square estimation of 
the slope to N data points is given as 



T,iLi( x i ~ x ){Vi - v) 



where x = N~ l ^Xi and y = N^ 1 ^2yi are the mean 



(1) 



values of x and y, respectively. In the standard hypoth- 
esis testing procedure, the null and alternate hypotheses 
are constructed as H : j3 = (3 and H a : (5 ^ (3 , respec- 
tively, where fa is the assumed value of (true) slope given 
the prior knowledge on the slope. For example, fa can be 
predicted from theoretical calculations. In case that the 
(true) variance of the slope is known, the ^-statistical test 
(with normal-distribution) can be applied, otherwise the 
i-statistical test (with F-distribution) will be used for the 
hypothesis testing. In general the variance is not known, 
therefore we adopt the t-statistical test in this paper. If 
the null hypothesis is accepted from the hypothesis test- 
ing, the testimator (of the slope), fa, is calculated as 
(jWaikar et al.lll984l ): 

fa = k$+{l-k)fa. (2) 
The constant fc in the above equation is defined as 
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^critical 
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where MSE = (N - 2)- 1 £f =1 (y, -a- (3x t ) 2 , S xx = 

YliLi( x i ~ x ) 2 an d ta/2,1/ is the t- value for 100(1 — 
a/2)% confidence interval obtained from the associated 
F-distribution table with v = N — 2 degree of freedom. 
Note that the null hypothesis is rejected if fc > 1. The 
properties of the testimator are such that: 

1 . The testimator is an unbiased estimator under Hq . 

2. The testimator has a smaller variance than the 
usual least square estimator, that is Var(/3 W ) < 

Var(/3). 

The proofs for these two properties are given in the Ap- 
pendix. 

2.1.1. Application to the Cepheid P-L Relation 

The motivation of this paper is to apply the testi- 
mator method to detect any non-linearity in the LMC 
P-L relation; this has been detected using the F-test 
(|Kanbur fc Ngeowi l2004t iNgeow et al J 120051) . To study 
any possible variation in slope as the period increases 
through 10 days, we first sorted the data according to 
period, from shortest to longest period in log(F). The 
sorted sample is then divided into m number of non- 
overlapping and hence independent subsets according to 
the Cepheid period. The purpose is to make the bi- 
variate observations independent for each of the sub- 
sets. Each of the subsets will then contain n numbers of 
Cepheids (if the number of data points in the last subset 
is small, then the last subset will be combined with the 
previous subset). This enables us to apply the testimator 
method in multiple stages, together with a conservative 
Bonferroni testing procedure 1 , for detecting such a slope 

1 The Bonferroni testing procedure states that for testing n g 
number of hypotheses, the confidence coefficient (1 — a/2) is re- 
placed by (1 — a/2n g ) in each of the hypothesis testings. This is to 
ensure that the overall confidence coefficient will not be less than 
the original desired value of (1 — a/2). 
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Fig. 1. — Illustration of the testimator procedures. The Cepheid 
data points are divided to several subsets, sorted according to the 
log(P). For the first round, two slopes from the first and second 
subsets are compared under the hypothesis testing. The testima- 
tor, /S^, is calculated if the null hypothesis is accepted. In the 
second round, the testimator from the previous round will be used 
to compare the estimated slope from the third subset. This is re- 
peated several times until all the subsets have been used up or the 
null hypothesis is rejected. 

variation in the sample. In essence the line of attack is to 
compute the slope of the first subset and then compare 
with the slope of the next subset. If the two slopes are 
"similar" , we look at the slope of the third subset with 
the smoothed slope obtained from a combination of all 
the previous subsets. Hence, at the i th round, the slope 
of a given subset of the data is computed and compared 
with the smoothed slope from the testimator of all the 
previous data points. If the two slopes are statistically 
equivalent, then the current subset of data will be incor- 
porated into the computation of the smoothed slope and 
compared with the slope of the next subset of data. This 
smoothness is an important feature since it helps to alle- 
viate, to some extent, the influence of outliers. However, 
if the slopes are "different" , i.e. a rejection of the null 
hypothesis, then there is an indication of slope change 
in the P-L relation. Therefore there will be a total of 
n g = m — 1 number of hypothesis testings in the multi- 
stage testimator procedures. In short, the algorithm of 
applying the testimator method in our case can be sum- 
marized as follows: 

a. In the first round, the slope of first subset, Pi, is 
calculated and denoted by 0% = Po- The slope of 
the second subset is then compared to Po under 
the null hypothesis of Ho : Pi — Po and alternate 
hypothesis of H a : [3 2 ^ Po- If H is accepted, 

- 1 

then the testimator in this round, p u , is calculated 
using equation (2). 

b. In the second round, the slope of the third subset, 
Ps, is calculated and denoted by P3. The testimator 

from the first round, represented as p u — Po-, is 
used in the hypothesis testing for this round. The 
null and alternate hypotheses in this round become 
Ho '■ Ps = Po and H a : /3 3 7^ p . If H is accepted, 



a new testimator, j3 u , is calculated using equation 
(2). 

c. In the i th round, the slope of the (i + l) th sub- 
set, /Jj+i, estimated by 0i+i, is calculated. The 
testimator from previous (i — 1) round is denoted 

as Puj — Pq. The null and alternative hy- 
pothesis in this round become Hq : Pi+i — Po 
and H a : p i+ \ ^ Po- If H is accepted, then 

Puj — kPi+i + (1 — k)Po with k refined from equa- 
tion (3). 

d. This is repeated until i = n g round or the null hy- 
pothesis is rejected in the i th round, which indicates 
a change in slope for the (i + l) th subset. 

e. Since in principle there will be a total of n g hy- 
pothesis testings, the Bonferroni testing procedure 
requires that t critica i = t a ^ n ^ v in each round. 

Throughout the paper, we will adopt a — 0.05 to en- 
sure the overall confidence level is more than 95% in our 
test. The first two rounds of our testimator procedures 
to the study the possible non-linear LMC P-L relation is 
illustrated in Figure [TJ 

In order to demonstrate the reliability of this proce- 
dure, we apply the testimator method to two simulated 
data sets: one built from a non-linear P-L relation with 
a break at 10 days and another one developed from a lin- 
ear P-L relation. For demonstration purpose, we select 
one set of the simulated data (out of many simulations) 
in each cases as representation for testing the testimator 
method. The plots of these two fake data sets, each of 
them co ntaining 1500 data points , can be found in fig- 
ure 1 of INgeow fe Kanburi (j2006cD . Full details of our 
procedure fo r developing these t wo "fa ke" data sets are 
described in INgeow fc Kanburi ((2006c). The results of 
applying the testimator procedures as described to these 
two fake data sets are given in Table [TJ In this tabic, 
column 1 denotes the subsets; column 2 gives the range 
of the period in each subsets; column 3 lists the number 
of data points, n, in each subsets; column 4 & 5 are the 
fitted slopes in each subsets and the assigned values of 
Po that used in the hypothesis testing; column 6 & 7 arc 
the observed and critical t- values for each of the hypoth- 
esis testing; column 8 & 9 are the corresponding fc-value 
and the outcome of the hypothesis testing; finally col- 
umn 10 is the values of testimator if the null hypothesis 
is accepted. Since we know which fake data set is in- 
trinsic linear and non-linear when constructing the P-L 
relation, we can verify the results found in Table [TJ For 
the fake data with linear P-L relation, our testimator re- 
sults show that the slopes for each subsets are consistent 
with the smoothed slopes given from the previous sub- 
sets, and the hypothesis testings correctly indicate that 
there is no changes in slope across all the period ranges. 
In contrast, the hypothesis testings for the fake data with 
non-linear P-L relation show that subset 7 has a differ- 
ent slopes than the previous subsets, which indicates a 
change of slope in this subset. Furthermore, the testima- 
tor procedures also correctly pick up the "break period" 
in subset 7, which brackets the input break period at 
10 days, from the outcome of hypothesis testing. There- 
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TABLE 1 

Testimator results for the fake data sets. 



Subset 


Period range 


n 


P 




{^observed \ 


t-critica 


k 


Decision 




(1) 


(2) 


(3) 

V'-V 


(4) 


(5) 


(6) 


(7) 


(8) 


(9) 


(10) 








"Fake" data set from a 


linear P-L relation 








i 


0.2315-0.4421 


200 


-2.182 ± 0.403 








— 


— 


— 


2 


0.4422-0.5005 


200 


-3.658 ± 0.949 


-2.182 


1.556 


2.718 


0.572 


accept Ho 


-3.027 


3 


0.5006-0.5508 


200 


-1.955 ± 1.128 


-3.027 


0.951 


2.718 


0.350 


accept Hq 


-2.652 


1 


0.5512-0.6079 


200 


-3.006 ± 1.025 


-2.652 


0.345 


2.718 


0.127 


accept Hq 


-2.697 


5 


0.6080-0.7349 


200 


-2.733 ± 0.442 


-2.697 


0.081 


2.718 


0.030 


accept Ho 


-2.698 


(5 


0.7349-0.9610 


200 


-2.841 ± 0.234 


-2.698 


0.611 


2.718 


0.225 


accept Hq 


—2.730 


7 


0.9610-1.3553 


200 


-2.493 ± 0.155 


-2.730 


1.531 


2.718 


0.563 


accept Ho 


-2.597 


8 


1.3652-2.6170 


100 


-2.684 ± 0.095 


-2.597 


0.921 


2.748 


0.335 


accept Ho 


-2.626 








"Fake" data set from a non-linear P-L relation 








1 


0.2315-0.4421 


200 


-2.442 ± 0.403 














2 


0.4422-0.5005 


200 


-3.918 ± 0.949 


-2.442 


1.556 


2.718 


0.572 


accept Ho 


-3.287 


3 


0.5006-0.5508 


200 


-2.215 ± 1.128 


-3.287 


0.951 


2.718 


0.350 


accept Ho 


-2.912 


1 


0.5512-0.6079 


200 


-3.266 ± 1.025 


-2.912 


0.345 


2.718 


0.127 


accept Ho 


-2.957 


5 


0.6080-0.7349 


200 


-2.993 ± 0.442 


-2.957 


0.081 


2.718 


0.030 


accept Hq 


-2.958 


6 


0.7349-0.9610 


200 


-3.101 ±0.234 


-2.958 


0.611 


2.718 


0.225 


accept Ho 


-2.990 


7 


0.9610-1.3553 


200 


-2.170 ±0.155 


-2.990 


5.281 


2.718 


1.943 


reject Ho 




Note. 


— See text for the description for each columns. Period ranges arc 


given in 


log(P). 







fore the testimator method can pick up the P-L relation 
which is intrinsically non-linear. 

2.2. The Schwarz Information Criterion 

The problem of deciding whether the LMC Cepheid 
data are more consistent with two lines of significantly 
different slopes rather than a single line is exactly anal- 
ogous to deciding the dimensionality of the model that 
will fit the given LMC Cepheid data. The method of 
maximizing the likel ihood t e nds t o choose the highest 
possible dimension. lAkaikd (|1974f ) suggested maximiz- 
ing the likelihood subject to a penalty depending on the 
dimensionality of the model under consideration (Akaikc 
Information Criterion, AIC): AIC = —2 lnL+2fc p , where 
L is the likelihood f unction of the model of dimension k p 
(see !Takeuchll2ljO0l as an example in the application of 
astronomy). However, ISchwar3 |l978) showed that max- 
imum likelihood estimators can be obtained from large 
sample limits of Bayes estimates for certain classes of a 
priori distributions. These distributions only put posi- 
tive probability on the subspaces of the paramete r space 
corresponding to the competing models. Sch warzl (|1978f ) 
derived the following criterion (Schwarz Information Cri- 
terion, SIC; or sometimes also referred as Bayesian In- 
formation Criterion, BIC, in the literature): choose the 
model for which 



SIC = -21nL- 



k p \nN 



(4) 



is a minimum, where TV is the total number of data points 
and k p = p + 1 (with p bei ng the number of fitted pa- 
rameters, see [Schwarz 1978). Some use of the BIC for 
models selection in astronomical and as trophysical lit- 
eratur e can be found , for exampl es , in lArentoft et all 
(l200ll) Handler et all <f2000l [2Q t p, [Koenl (11996 1 , Tl999L 
2006D.lKoen fc Schumann! (ll999l) . lKoen fe LaneT(|2000f) . 
Koen fc Lom bard! (119931. 120035. iLiddld (120041 1 2007) , 



Mukheriee et al.l1l99c^1Porciani fc Norberd (|2006l ) and 
Sterken et all (|1999f) . 



2.2.1. Application to the Cepheid P-L relation 

To test the non-linearity of the Cepheid P-L relation 
with the SIC method, we consider the models with a 



linear P-L relation (the null hypothesis) and a non-linear 
P-L relation with a break period (in days) at Po (the 
alternate hypothesis) in this paper. For the former case, 
we have: 



H : m = rh = $ log(P) + a, 



with a = 



1 



-N 



N -2 



2a 2 

SIC (H ) = -21nL + 31niV. 
Similarly, for the alternate model, we have: 



H a ■ m — rh= < 



L = 



f3 s \og(P)+a s , forlog(P) < log(F ), 

with & % = wj=2 Ei=f S ( TO i - Wi), 
h log(P) + a L , forlog(P) > log(Po), 
with a 2 L = 1 rr L j2 T,iZi L (mi - m), 



1 



1 



(V2^) N (Zs) Ns (°l) Nl 

1 N S 1 Nl 



SIC(H A ) = -21nL + 5 In N. 

In these expressions, Ns + Nl = N and m is the observed 
magnitudes after correcting for extinction. The slope 
(P) and zero-point (a) parameters in the above models 
are obtained from the maximum likelihood estimation 
(MLE, which is equivalent to standard least square esti- 
mation in our case). Note that the sample variance (a 2 ) 
from MLE is a biased estimate. We corrected the bias 
with Nr Lj g) — 2 degrees of freedom. For the alternate 
models, SIC(Ha) is calculated with a range of log(P ) 
that increment in steps of, for example, 0.001. There- 
fore, a model with linear P-L relation and a range of 
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models with non-linear P-L relations at different break 
period are tested with the SIC method. The model with 
smallest SIC value is the preferred model. In case of 
SIC{H A ) < SIC(H ), the minimum value of SIC{H A ) 
not only suggests that the P-L relation is non-linear, but 
also gives an estimate of the break period. 

To test the SIC method, the same simulated data sets 
as in the case of the testimator method in Section 2.1.1 
were used. For the "fake" data set a with linear P-L re- 
lation, the values of SIC{H ) and SIC(H A ) is -164.65 
and —161.21, respectively. While for the "fake" data 
set with non-linear P-L relation, we found SIC(Hq) = 
-100.56 and the minimum value of SIC(H A ) = -154.62 
occurs at log(Po) = 0.983. We then test the SIC method 
for our application further with various simulations. We 
first ran two sets of simulations: one set of simulations 
use the linear P-L relation as input P-L relation, and 
another set of simulations include the non-linear P-L re- 
lation with a break at log(Po) = 1-0- These simulations 
mimic the period distribution and the observed disper- 
sions along the P-L relation from the real data. The 
de tails for construct i ng thes e simulations can be found 
in iNgeow fc Kanburl ([2006c) . In either sets of the sim- 
ulations, a large number of simulations is run (typically 
1000) and the break period (in log[Po]) is searched with 
the SIC method. If the break period cannot be found 
then this implies the linear P-L relation is the preferred 
model, and vice versa. The top panels of Figure [5] dis- 
play the distributions of the break periods from these 
two sets of data. For the case of a linear P-L relation, 
the SIC method did not find any break period ~ 90% of 
the time. While for the case of no-linear P-L relation, 
the SIC method detects a range of break period with a 
peak at log(Po) ~ 1.0. Therefore, the SIC method can be 
used to correctly identify the P-L relation that is either 
intrinsically linear or non-linear at a given break period. 

The relatively large dispersion around log(Po) ~ 1.0 
and the long tail toward shorter period that is exhibited 
in the top panels of Figure [2] can be due to a combination 
of two effects: (1) the existence of the intrinsic dispersion 
along the P-L relation; and (2) the non-uniform distri- 
buti on of the periods in the d ata (see INgeow et al.ll2005l 
and INgeow fc Kanburl l2006d for more discussion about 
the period distribution for Cepheid variables). To por- 
tray the impact of these effects on the application of the 
SIC method for detecting the break period, we ran two 
additional experiments. The first retains the original pe- 
riod distribution but the intrinsic dispersion of the P-L 
relation is excluded (however the random photometric 
errors still persist in the simulation), while the second 
simulation use a uniform period distribution (in log[P]) 
and the intrinsic dispersion of the P-L relation is not ex- 
cluded. The resulted distributions of the break period 
from SIC method are presented in the bottom panels of 
Figure [2] It can be seen that the long tail of the dis- 
tribution present in the top panels is reduced when a 
uniform period distribution is assumed. Furthermore, if 
the intrinsic dispersion does not exist in the Cepheid P-L 
relation, then the SIC method is very efficient to detect 
the intrinsic break period (at log[Po] = 1.0 in our case). 
In reality, the intrinsic dispersion along the P-L relation 
cannot be eliminated or reduced (at least in the optical 
bands) and the period distribution of the Cepheid vari- 
ables will not be uniform (for the reasons discussed in 



INgeow k, Kanburl l2006ch . We emphasize that the the- 
oretical pulsation modelings are needed to identify the 
location of the break per iod or to confirmat ion the break 
period at log(P ) ~ 1.0 (jNgeow et alj|2005|) . 

3. DATA AND RESULTS 

In this section, we apply both the testimator and SIC 
methods to the real LMC Cepheids data in order to in- 
vestigate whether the V-band Cepheid P-L relation at 
mean light is non- linear or not. We concentrate on the 
l/-band mean light data because the data available in 
the literature are mostly in the F-band mean light and 
also because of the evi dence for non-linearity as a func- 
tion of phase is clear (|Ngeow fc Kanbunl2006ah . The 
data sets we used in this study includ e the OGLE data 
adopted from lKanbtir fc Ngeowl (120061) a nd the MACHO 
data adopted from INgeow et alJ (|2005f l They are re- 
ferred as the "OGLE" sample (with 641 Cepheids) and 
the "MACHO" data (with 1216 Cepheids), respectively. 
Note that both data sets have been corrected for extinc- 
tion using the method described in the corresponding 
papers. It is also important to point out that these two 
are independent data sets. To investigate the influence 
of longer period stars in our testing as well as increasing 
the number o f Cepheids in the O GLE sample, we append 
the data from lSebo et al.1 (|2002l ) to the OGLE sample af- 
ter proper removal of duplication of the Cepheids in both 
samples and the correction of extinction. This third data 
set is called "OGLE+SEBO" sample (with 723 Cepheids) 
and it extends to log(P) ~ 2.0. 

The results from using the testimator method to these 
three LMC Cepheid data sets are summarized in Ta- 
ble [2j with identical layout as Table [T] In the case 
for the OGLE and OGLE+SEBO data sets, we have 
tried different sample subset sizes by dividing the sam- 
ples to n = 100 and n = 150, which are referred as 
Test 1 and Test 2 in the table, respectively. In all cases, 
the testimator method implies that there is a change 
of slope in the last subset of the samples. Similar re- 
sults found from Test 1 and Test 2 suggest that our re- 
sults are not affected by the size of each subset. This 
indicates the LMC P-L relation becomes non-linear as 
the period increases through 10 days to longer periods. 
The last subset also brackets the fiducial break period 
at /around 10 d a ys: th is is consistent with previous work 
of iNgeow et all ll2005h . 

The results from using the SIC method are presented 
in Figure[3]and Table[3]for the same data sets. In Figure 
H the values of SIC for both SIC(H ) and SIC{H A ) are 
plotted as a function of the chosen break period, log(Po). 
Since SIC(Ho) is independent of log(Po), this represents 
a straight horizontal line in the figure, and the values 
of SIC (Ho) for these three data sets are given in Table 
[3l For the case of SIC(H A ) as a function of log(Po), 
the figure bears witness to the fact that there is a range 
of log(Po) at which the values of SIC(H A ) are smaller 
than SIC(Ho) in all three data sets. This implies that 
the non-linear P-L relation is preferred within these pe- 
riod ranges. This result also reinforces the findings of 
Figure [H that it is difficult to determine the exact break 
period of the P-L relation with SIC method (see Section 
2.2.1 as well), if it is present. The minimum values for 
SIC(H A ) found from the figure, and the corresponding 
log(Po) are summarized in Table [3] as well. The conn- 
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Fig. 2. — Distributions of the estimated break periods, log(Po), using the SIC method with various simulations. The top panels show the 
histograms from two simulations at which the input P-L relations to the simulation is linear (dashed histogram) and non-linear with a break 
period at log(Po) = 1-0 (thick histogram), respectively. The bottom panels show the histograms from two additional simulations using the 
same non-linear P-L relation with a break period again at log(Po) = 1-0 as input P-L relation: one simulation has uniform distribution of 
the periods (in log[P]) in the simulated data, and another simulation did not include the intrinsic dispersion of the P-L relation. The right 
panels show the blown-up region of the left panels. 



TABLE 2 

Testimator results for the real data sets. 



Subset 


Period range 


n 







A) 


^observed \ 


^critical 


k 


Decision 




(1) 


(2) 


(3) 




(4) 


(5) 


(6) 


(7) 


(8) 


(9) 


(10) 










OGLE sample, 


Test 1 










1 


0.4022-0.4771 


100 


-1 


427 ± 0.967 














2 


0.4787-0.5293 


100 


-2 


273 ± 1.399 


-1.427 


0.605 


2.627 


0.230 


accept Ho 


-1.622 


3 


0.5294-0.5889 


100 


-0 


746 ± 1.095 


-1.622 


0.800 


2.627 


0.304 


accept Ho 


-1.355 


4 


0.5891-0.6703 


100 


-1 


887 ± 0.675 


-1.355 


0.788 


2.627 


0.300 


accept Ho 


-1.515 


5 


0.6704-0.7891 


100 


-3 


055 ± 0.703 


-1.515 


2.193 


2.627 


0.835 


accept Ho 


-2.801 


6 


0.7900-1.6768 


141 


-2 


462 ± 0.082 


-2.801 


4.106 


2.612 


1.572 


reject Ho 












OGLE sample, 


Test 2 










1 


0.4022-0.5043 


150 


-2 


547 ± 0.647 














2 


0.5043-0.5889 


150 


-1 


783 ± 0.641 


-2.547 


1.193 


2.421 


0.493 


accept Ho 


-2.171 


3 


0.5891-0.7083 


150 


-2 


347 ± 0.401 


-2.171 


0.438 


2.421 


0.181 


accept Ho 


-2.203 


4 


0.7103-1.6768 


191 


-2 


590 ± 0.075 


-2.203 


5.139 


2.415 


2.128 


reject Ho 












OGLE+SEBO sample, Test 1 










1 


0.4022-0.4746 


100 


-0 


989 ± 0.882 














2 


0.4752-0.5242 


100 


-2 


476 ± 1.202 


-0.989 


1.237 


2.693 


0.459 


accept Ho 


-1.672 


3 


0.5245-0.5729 


100 


-4 


743 ± 1.339 


-1.672 


2.292 


2.693 


0.851 


accept Ho 


-4.286 


4 


0.5734-0.6469 


100 


-2 


743 ± 0.907 


-4.286 


1.701 


2.693 


0.632 


accept Ho 


-3.311 


5 


0.6491-0.7320 


100 


-2 


921 ± 0.933 


-3.311 


0.418 


2.693 


0.155 


accept Ho 


-3.250 


6 


0.7330-0.9071 


100 


-3 


315 ± 0.400 


-3.250 


0.162 


2.693 


0.060 


accept Ho 


-3.254 


7 


0.9112-2.1268 


123 


-2 


497 ± 0.089 


-3.254 


8.535 


2.682 


3.181 


reject Ho 












OGLE+SEBO, 


Test 2 








1 


0.4022-0.4977 


150 


-2 


545 ± 0.546 














2 


0.4891-0.5729 


150 


-2 


826 ± 0.706 


-2.545 


0.398 


2.529 


0.157 


accept Ho 


-2.589 


3 


0.5734-0.6831 


150 


-2 


557 ± 0.432 


-2.589 


0.073 


2.529 


0.029 


accept Ho 


-2.588 


4 


0.6831-0.9071 


150 


-3 


153 ± 0.253 


-2.588 


2.234 


2.529 


0.883 


accept Ho 


-3.087 


5 


0.9112-2.1268 


123 


-2 


497 ± 0.089 


-3.087 


6.651 


2.536 


2.623 


reject Ho 












MACHO sample 










1 


0.4008-0.4715 


200 


-2 


391 ± 0.958 














2 


0.4719-0.5226 


200 


-1 


843 ± 1.189 


-2.391 


0.462 


2.601 


0.178 


accept Ho 


-2.294 


3 


0.5231-0.5787 


200 


-2 


623 ±1.127 


-2.294 


0.292 


2.601 


0.112 


accept Ho 


-2.331 


4 


0.5795-0.6851 


200 


-1 


851 ± 0.809 


-2.331 


0.594 


2.601 


0.228 


accept Ho 


-2.222 


5 


0.6588-0.7891 


200 


-2 


948 ± 0.524 


-2.222 


1.385 


2.601 


0.533 


accept Ho 


-2.608 


6 


0.7910-1.4501 


216 


-2 


123 ± 0.122 


-2.608 


3.991 


2.599 


1.536 


reject Ho 
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Fig. 3. — Results of the SIC(Hq) and SIC(Ha) as a function 
of the choosing break period, log(Po), for the three LMC Cepheid 
data sets. The thick horizontal lines are the results for SIC (Ho), 
which are independent of the chosen break period. The "curves" 
are the results for SIC(Ha)- The horizontal dotted lines represent 
the c h osen break period i n the l iterature (e.g., "" 
2002]; IKanbur fc Ngeowl l200l ISandage etal 

Tim . 
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Fig. 4. — Resulted histograms from the bootstrap re-sampling 
at the break period given in Table \3\ for the three LMC Cepheid 
data sets. See text for details. 

dence intervals for the break period can be estimated us- 
ing bootstrap re-sampling methods. For the model with 
given log(Po) hi Table [31 the errors of the regression, 
e.i = mi — rhi, are randomly drawn (with replacement) 
to construct a "new" data set, and a new break period 
is estimated. This is repeated many times to build up 
the distribution of the break periods. The resulting his- 
tograms for these three sets of data are presented in Fig- 
ure H From these distributions, the 5 th -, 25 th -, 75 th - 
and 95 th -percentile are estimated for each of the data 
sets. The results are given in the last four columns of 
Table [3] At first glance the break period found from the 
MACHO data seems to be inconsistent with the OGLE 
and OGLE+SEBO results. This is due to the difficulty 



Fig. 5. — Comparisons of the histograms from three sets of 
simulations for the MACHO data: (1) a simulation that takes a 
non-linear P-L relation with a break at log(Po) = 1-0 as input P-L 
relation and the intrinsic dispersion is included; (2) a simulation 
with a linear P-L relation as input P-L relation and the intrinsic 
dispersion is included; and (3) a simulation that takes a non-linear 
P-L relation with a break at log(Po) = 1.0 as input P-L relation 
but without the intrinsic dispersion. Unlike other simulations done 
in this paper, the periods that go into the simulations are from the 
actual MACHO data. 

of accurately estimating the break period with the ex- 
istence of the instability strip. To demonstrate this, we 
use the exact periods in MACHO data as input periods 
to our simulations, and generate three different sets of 
simulations: (1) a simulation with intrinsic non-linear P- 
L relation; (2) a simulation with linear P-L relation; and 
(3) a simulation with intrinsic non-linear P-L relation 
but without the intrinsic dispersion. The resulting his- 
tograms for these three sets of simulation are displayed 
in Figure [5] From this figure it is clear that our result 
of the break period for MACHO data does not imply an 
inconsistency to the OGLE and OGLE+SEBO results. 
The break period found in the data, log(Po) = 0.833, is 
within the range of the break periods found from the 
simulations. This figure also portray the difficulty of 
estimating the break period from real data when the 
intrinsic dispersion along the P-L relation is present. 
Therefore, the break periods given in Table [3] are con- 
sistent with the results from testimator (Table ^ , the 
result from non-lin ear estimation procedure applied in 
INgeow et all (|2005l log(P ) = 0.934 with upper and 
lower 95% confidence level of 1.089 and 0.778, respec- 
tively) and the adopted log(Po) = 1-0 in the literature. 
Note that in previous studies (e.g. , iTammann fc Reindll 
2001 IKanbur fc Ngeowi[200l [2001 (Sandage et al.l l2004J: 



Ngeow et al.l l2005UNgeow fc Kanburl l2006bO the break 
period is conveniently chosen to be at 10 days, which is 
represented as dotted vertical line in Figure [3J The SIC 
results also supported the non-linear P-L relation to be 
the preferred model at log(Po) = 1.0. 

4. CONCLUSION AND DISCUSSION 

Using two additional statistical approaches, the 
method of testimators and SIC, to the OGLE, the 
OGLE+SEBO and the MACHO Cepheids data, we have 
found strong statistical evidence for a change of slope in 
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TABLE 3 

SIC RESULTS FOR THE REAL DATA SETS. 



Datasct 


SIC(Ho) 


SIC(H A ) 


log(Po) 


5 th -percentile 


25 th -percentile 


75 th -percentile 


95 th -percentile 


OGLE sample 


-179.29 


-188.86 


1.041 


0.550 


1.002 


1.041 


1.101 


OGLE+SEBO sample 


-296.43 


-304.28 


1.041 


0.560 


0.922 


1.052 


1.131 


MACHO sample 


182.61 


156.20 


0.833 


0.806 


0.826 


0.838 


0.936 



the Cepheid P-L relation in the LMC at longer period 
range. These results also strongly support the previous 
results obtained from the F-test. Therefore, the observed 
LMC P-L relation is non-linear based on these rigorous 
statistical tests. This implies that either the LMC P-L 
relation is indeed non-line ar or there are some hid den 
factors in the analysis (see INgeow fc Kanburl l2006cl for 
more discussion on this). Furthermore, the break pe- 
riods, or the range of permissible break periods found 
from this study are consistent with the conveniently cho- 
sen break period at 10 days in previous studies. However, 
our study, both with real and fake data, implies that it 
is difficult to accurately estimate the break period with 
both the testimator and SIC methods. This is due to 
the existence of the intrinsic dispersion along the P-L re- 
lation. The confirmation of the break period at/around 
10 days has to be done from stellar pulsation modeling 
studies. 

The implications of a non-linear LMC P-L relation on 
the extra-galactic distance sc ale and the Hubb l e con - 
stan t have been discussed in iNgeow fc Kanburl (|2005f ) 
and INgeow fe Kanburl (|2006bD and will no t be repeated 
here. A number of autho r s, including ISpergel et al. 
2001) , iTegmark et ail ((2001 ) , iMacri et all (|2006j), [Oiling 



2006) and the reference therein, have commented on 



how an independent estimate of the Hubble constant ac- 
curate to 1% can significantly reduce the error bars on 
fi, the total density of the universe. Applying the cor- 
rect form of the Cepheid P-L relation will help in re- 



ducing the syste matic error of the Hubble constant in 
the future studies (|Ngeow fc Kanbur]|2006bl ld\ Over and 
above this, if the Cepheid P-L relation does indeed have 
a change of slope at 10 days, it is important to under- 
stand this from a stellar pulsation and evolution point of 
view an d investigate fully the ramifications of this n ew 
fe ature (iKanbur fc Ngeowll2006t iMarconi et aTll2005fh 

INgeow fc Kanburl (|2006cD has investigated various fac- 
tors that may cause the observed non-linear LMC P-L 
relation, including the influence of outliers and lack of 
longer period Cepheids in the sample. However the re- 
sults from that study suggest that none of these factors 
are responsible for the observed non-linear LMC P-L re- 
lation. We emphasize that the samples we used in the our 
studies have been cleaned up for obvious outliers. Fur- 
ther, the testimator approach estimates the slope with 
a variance which is smaller than the standard formula 
(property 2 stated in Section 2.1.1) is able to minimize 
the effect of (additional) outliers by smoothing. Regard- 
ing the lacks of longer period Cepheids in the sample, we 
have use the OGLE+SEBO as an expansion sample to 
the OGLE sample with the increase of period coverage. 
Both of the samples have shown the same testimator ans 
SIC results. Therefore, we believe this should not be the 
cause for the observed non-linear LMC P-L relation. 

The authors would like to thank the referee for use- 
ful suggestions. This research was supported in part 
by NASA through the American Astronomical Society's 
Small Research Grant Program. 



APPENDIX 

PROOF FOR THE PROPERTIES OF THE TESTIMATOR 

We prove the two properties of the testimator as described in Section 2.1 here. To prove that the testimator is an 
unbiased estimator under Ho, we note that the testimator from equation (1) is: 

/L = k(0-0 o )+0o, 

where fc is defined in equation (3). Therefore the above expression can be re- written as: 



0ul = , , = + po- 

t a/2}U y/MSE/S X x 



(Al) 



This implies that, 



E U ) = Z2EE.E(-JL=)E[0 -0 O \(0- (3 )} + 0o. 



t a /2, v VMSE 

Since _B(|z|z) = for variable z = — 0o with standard normal distribution, and from the above expression, we obtain 

E(0l) = 0o 

as desired. The second assertion states that Var(/3 W ) < Var(/3). To proof this, we first re-arrange equation (Al) such 
that: 



(0u - 00? = 



(0-00? 
tl/2, v MSE 



Sxx- 
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_ §-0o 



then Z has a standard normal distribution 



Assume f3 is normally distributed with A(/3o,crg) and define Z 

with N(0, 1). Note that cr? = Var(/3) = a 2 /Sxx, where cr 2 is the variance of the linear regression y — fix + a, the 
above expression is reduced to: 



(& - M 2 = z 4 



Hence, we have 



MSEt* /iiV Sxx' 



Var{i3 0J )=E(Z A )E( 



MSE'tl /2u Sxx' 



(A2) 



as the last two terms are constants and Var(/3 W ) = E\{p u — /3o) 2 ]- For E(Z 4 ), since the forth moment of a standard 
normal distribution (the Kurtosis) is 3, then E(Z 4 ) = 3. For E{ ^ SE ), we observe that ^se = mse Ja 2 = (A — 
2)/J2( Vi ~ & a^ Xi ) 2 - Therefore, (A - 2)MSE/a 2 is \ 2 distributed with v = A - 2 degree of freedom. It is well-known 
that if X is x 2 distributed with v degree of freedom, then E(l/X) = l/(v- 2), hence E( j^) = (N - 2)/(N - 4). 
Recall that a 2 /Sxx — Var(/3), then equation (A2) is reduced to: 

,{N-2) 1 



Var(/3 U ) 



A- 4 i 2 /9 

a/2,1' 



Var(/3). 



If ia/2,1/ > ^3(^-2)7(^-4), we then have 



Var(/i) < Var(/3) 



as the assertion states. Due to the Bonferroni testing procedure, condition t a / 2ng , u > \J 3( A — 2)/ (A — 4) is satisfied 
when A > 5 and a < 0.1. 
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